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SUMMARY Using daily infection data for Hong Kong we 
explore the validity of a variety of models of disease propagation 
when applied to the SARS epidemic. Surrogate data methods 
show that simple random models are insufficient and that the 
standard epidemic susceptible-infected-removed model does not 
fully account for the underlying variability in the observed data. 
As an alternative, we consider a more complex small world net- 
work model and show that such a structure can be applied to re- 
liably produce simulations quantitative similar to the true data. 
The small world network model not only captures the appar- 
ently random fluctuation in the reported data, but can also repro- 
duces mini-outbreaks such as those caused by so-called "super- 
sprcadcrs" and in the Hong Kong housing estate Amoy Gardens. 
key words: SARS, surrogates, epidemic models, small world 
networks 

1. SARS in the SAR 

The redundantly named Severe Acute Respiratory 
Syndrome (SARS) [1] appeared in the Guangdong 
province of China in November 2002 and spread to the 
Hong Kong SAR*. From Hong Kong, the virus spread 
throughout the world: largely due to the airline (small 
world) network hub in Hong Kong. The economic and 
social effects of the SARS virus have been subject of 
much attention in the popular media [2] and have been 
widely reported. Less well reported, but no less un- 
usual is the effect this disease had on academic activi- 
ties, including the cancellation of the 2003 International 
Symposium on Nonlinear Theory and its Applications. 

In this report we analyse the SARS daily infec- 
tion data for Hong Kong and test the validity of three 
distinct types of models: (i) stochastic models gen- 
erated from surrogate data; (ii) standard susceptible- 
infected-removed (SIR) models; and (iii) small world 
network models. We find that the small world network 
models are the only models capable of reproducing the 
quantitative behaviour of the SARS epidemic in Hong 
Kong. Moreover, these models also exhibit many fea- 
tures characteristic of this epidemic: a small fraction of 
individuals show a very great propensity to infect others 
(the "super-spreaders"); and, propagation of the virus 
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Fig. 1 Daily reported SARS infections. The top plot 

shows the daily number of SARS infections in Hong Kong. The 
lower plot are the daily number of infections in Taiwan (dot- 
dashed), Singapore (dashed), and the Chinese mainland (solid). 
For the purpose of comparison, the numbers on the lower plot are 
normalised by the maximum daily infection tally. The maximum 
daily infection tally for Taiwan, Singapore and China was 26, 13 
and 203. The data for Taiwan and Singapore has been revised 
based on case information after the epidemic, this has not been 
possible for the China data. 



within certain physical locations led to a large number 
of infections (the outbreaks in Prince of Wales Hospital 
and Amoy Gardens Housing Estate). 

Figure 1 shows the daily infection rate for SARS 
in Hong Kong between 15 February 2003 and 15 June 
2003. Compared to the data for other territories (lower 
panel of Fig. 1), the data for Hong Kong appears to 
be more complete (compared to Taiwan), more accu- 
rate (compared to the Chinese mainland**) and over 
a longer time period (compared to Singapore). The 
data for Hong Kong is shown as two time series, the 
original daily number of infections, as recorded by the 
Hong Kong Department of Health, and revised figures 
released after a more detailed analysis of the true infec- 
tion path [3]. The revised data offers a more accurate 
picture of the true daily number of infections, but this 
data uses information not available at the time of the 
outbreak and we therefore do not analyse that data in 

**In fact, various sources for the Chinese mainland data 
showed widely divergent estimates. Furthermore, subse- 
quent analysis has shown that the original data significantly 
ofer-estimated the effect of the virus. 
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Fig. 2 Inter-day variability. Variability between daily val- 
ues plotted as the difference xt — xt+i (top plot), or log ratio 
log ^^^^ (lower plot). 

this report. 

For the current analysis, let Xt denote the num- 
ber of infections reported in the Hong Kong media 
(based on statistics published by the Hong Kong de- 
partment of Health) for day t and we start with Xq = 0. 
Hence t is the number of days since 11 March 2003 
(March 12 is the first day of recorded infections in Hong 
Kong, infections prior to this date were only identi- 
fied in the revised data). For notational convenience 
we denote the entire time series as {xt}t — {a^tltlo = 

{xo,Xi,X2, . ■ ■,Xn)- 

The remainder of this paper describes the analy- 
sis of the original data in Fig. 1(a) with three differ- 
ent model structures. In the next section we describe 
the application of surrogate data techniques to test the 
Hong Kong SARS time series against the hypothesis 
that inter-day variability in the reported incidence of 
SARS is random. In Section 3 we assess the accuracy 
of the SIR model and in Section 4 we describe the small 
world network model. 

2. Random variability 

The simplest model of disease propagation is that of a 
random walk such that xq = xn = and Xt > for 
t = 0, . . . , iV. The duration of the disease, N, is es- 
sentially a random variable. The inter-day variation is 
random and, according to the Central Limit Theorem, 
the disease will eventually die out. 

Our first task is to transform the data in Fig. 1 to 
have zero mean and be approximately independent and 
identically distributed (i.i.d.). We do this by taking the 
difference of successive values 

(1) 

Zt = Xt- Xt+l 

or taking the log-ratio of successive values 

(2) , Xt 

zi ' = log . 

Xt+\ 



Both {^t^^jt and {zj^"*}* are depicted in Fig. 2. We note 
that because there is an underlying pattern in {|a;t|}t 
we expect that {z\ will offer a time series closer to 

i.i.d than \z^^^^t■ 

We now apply the method of surrogate data [4], 
[5] to determine whether the transformed data in Fig. 
2 is statistically distinct from i.i.d. noise. Note that 
this is equivalent to testing whether the raw time series 
{|x(|}t is a random walk. 

To test for i.i.d. noise the method of surrogate data 
specifies that one should generate an ensemble of surro- 
gate data sets. Each surrogate data set is a realisation 
of an i.i.d. process, but is otherwise "like" the original 
data. From this ensemble of surrogates one estimates 
some statistical quantity and compares this distribu- 
tion of statistic values to the statistic value for the true 
data. If the true data has a statistical value typical of 
the surrogates then the underlying hypothesis (in this 
case, that the data is i.i.d.) may not be rejected. If the 
data is atypical of the surrogate distribution then the 
hypothesis may be rejected. 

To generate suitable surrogates, we wish to destroy 
temporal correlation but preserve the data's probabil- 
ity density function. The simplest way to do this is 
by randomly shuffling either without [4] or with [6] re- 
placement. The choice of suitable test statistic is also 
important [7]. In our past experience we have advo- 
cated correlation dimension [7] or other dynamic invari- 
ants [5]. However, for such extremely short time series 
this is inappropriate. We attempted to use nonlinear 
complexity [8] as a measure, but found that this index 
exhibited extremely low sensitivity. We instead settle 
on the normalised covariance between xt and xt+i 

< XtXt+l > 

<xl> 

One further advantage of this simple measure is that 
it is sensitive to linear dependence: exactly the prop- 
erty we wish to detect. Unfortunately, this also means 
that testing more complex linear surrogate hypotheses 
will not be possible. Hence, as suggested by Takens [9] 
we prefer to test the model residuals against the i.i.d. 
hypothesis. 

The results of Fig. 3 clearly indicate that the data 
of Fig. 2 is not consistent with an i.i.d. noise source. 
Therefore the original data (Fig. 1) may not be mod- 
elled as a simple random walk. 

It would be logical to continue this analysis with 
more sophisticated surrogate algorithms [4], [10]. How- 
ever, for this short time series results of more sophis- 
ticated algorithms are not conclusive. Furthermore, as 
we have already noted, it would not be possible to ap- 
ply the covariance as a discriminating statistic in these 
cases, as the surrogate algorithms explicitly preserve 
autocorrelation. Instead, in the following section we 
consider a simple deterministic model of disease prop- 
agation. 
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Fig. 3 Surrogate calculations. The normalised covariance 
of xt and xt+i, is plotted for the two time series in Fig. 2, along 
with 1000 shuffled surrogates. In both cases the hypothesis that 
the data in Fig. 2 is independent and identically distributed noise 
is soundly rejected (p > 0.95). 



3. SIR models 

The standard Susceptible-Infected-Removed (SIR) 
model of disease propagation has a long history and 
is considered in detail in any standard text on mathe- 
matical biology [11]. In difference equation form, the 
relationship between the three populations at time t 
may be expressed as 

St+i \ ( St{l~rh) \ 
It+i = {rSt + il-a))h (1) 
Rt+i J V Rt + alt J 

where St, It, Rt sue the number (or proportion) of 
individual that are susceptible to infection, infected 
and "removed" respectively. Not that, by removed, we 
mean individuals that have been infected but are no 
longer infectious or susceptible. Such individuals may 
be cured (and immune), quarantined, or dead. The pa- 
rameters r and a are the infection rate and the removal 
rate respectively. 

Using the SIR data wc obtain a maximum likeli- 
hood estimate (i.e. minimum least squares fit) of the 
parameters a and r for the Hong Kong SARS time se- 
ries. The time series in Fig. 1 is the number of new 
infections reported each day. This corresponds to the 
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Fig. 4 SIR models of SARS. The top plot shows the daily 
number of SARS infections in Hong Kong and the best fit SIR 
model. The lower shows five simulations from this model with 
multiplicative noise (described in the text) on the daily number 
of new infections 

term rStIt in Eqn. (1). The numerical fitting proce- 
dure often estimates (incorrectly) a monotonically in- 
creasing of decreasing function. To prevent these so- 
lutions, we add leading and trailing zeros to the data 
prior to fitting. This helps to ensure that the optimal 
model exhibits a single extremum — not at either end- 
point. An alternative to fitting rSth to the observed 
data would be to also consider the daily fatality and 
recovery data for Hong Kong as an approximation for 
ARt- One could then fit Alt and ARt directly from 
the observed data. 

In Fig. 4 (a) we see the time course of the best 
model (Eqn. (1)) fitted from this data. Clearly, this 
deterministic model does not capture the random vari- 
ation observed in the data. 

One way of considering this random variation is 
that the SIR model provides an estimate of the rate of 
infection (i.e. It+i = [rSt + (1 — a))It), but that this 
estimate is subject to random fluctuations. It may be 
better therefore to perturb the number of new infections 
by some unknown quantity tt ■ The modified, stochastic 
SIR equations then read 

St+i \ ( St{l-rlt)~et \ 
It+i = [rSt + {1- a))It +et ]. (2) 
Rt+i J \ Rt + alt J 

A similar random term could be added to the rate of 
removal, however we do not consider this problem here. 
Moreover, arbitrarily complex noise inputs could be de- 
vised that would eventually reproduce features of the 
observed data. Our concern is only whether these fea- 
tures can be observed from the SIR model, or immedi- 
ate generalisations of this model. 

We found that simulations with et ^ N{0, <t^) for a 
constant did not produce realistic simulations. The 
problem is that the magnitude of the random fiuctua- 
tion should be proportional to the number of infected 



4 



lEICE TRANS. ??, VOL.Exx-??, NO.xx XXXX 200x 




-0.6 -0.4 -0.2 0.2 04 0.6 08 



Fig. 5 Surrogate calculations for stochastic SIR data. 

The normalised covariance of xt and xt+i, is plotted for the data 
along with 1000 independent model simulations (this is a similar 
calculation to that depicted in Fig. 3 (b)). The statistics in panel 
(a) and (b) differ. In panel (a) the normalise covariance of the 
data is estimated, in panel (b) the normalised covariance of the 
log-ratio of the data is computed. In both cases the hypothesis 
that the SIR model is a good model is soundly rejected (p > 0.95). 



individuals. Few infected individuals are unlikely to 
have a great variation in effect, but many such individ- 
uals will. In Fig. 4 (b) we show five simulations with 
the noise term et ^ 7V(0, (/fcr)^). To ensure that the 
model is physically realistic (i.e. non-negative popu- 
lations) we also restrict et > —{rSt + (1 — a))It. We 
found that a w 0.25 gave quantitatively reasonable re- 
sults and this is the value employed in Fig. 4. 

To quantitatively evaluate the effectiveness of the 
SIR type models (Eqn. (2)) we apply a variant on 
the method of surrogate data [7]. However, instead of 
generating constrained realisations consistent with the 
null hypothesis^, we generate multiple realisations of 
the model and compare these realisations to the data 
[9]. For the model described by Eqn. (2) we gener- 
ate 1000 simulations and for each simulation compute 
< XtXt+i > and < ZtZj+i > where Zt = log That 
is, we use two distinct test statistics. Results for both 

^See [12] for a description of this rather exclusive nomen- 
clature. Informally this means we do not force the surro- 
gates to preserve specific properties of the data. This means 
that we are unable to reject all SIR models with this test, 
only the SIR models that are fit to this data. 
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Fig. 6 Variable parameters (a,r). The top plot shows the 
best fit estimates of the parameters a and r where the parameters 
are fitted to a window of 10 successive data. The lower plot shows 
a deterministic prediction using these variable values. 



statistics are plotted in Fig. 5 show a clear distinc- 
tion between data and surrogates. Since this model is 
the optimal fit to the data and the distribution of co- 
variance estimates do not vary greatly with change in 
model parameters''"'' , we therefore conclude that models 
of the form Eqn. (2) are not adequate to describe the 
underlying dynamics. 

One widely accepted reason for the failure of the 
SIR models is that the parameters a and r do not re- 
main constant during the time course of an epidemic. 
Particularly when a disease is previously unknown and 
public health practice changes significantly in response 
to the spread of the disease. This is certainly consistent 
with the situation for SARS in Hong Kong. We there- 
fore modify the SIR model of Eqn. (1) by estimating 
the temporal dependence of a and r. We apply a sliding 
window of width 10 days and re-estimate a and r for 
every day. Figure 6 depicts the results of this computa- 
tion. The values at and rt are based on the best fit esti- 
mate of Eqn. (1) to the data {xt, Xt+i,Xt+2, • ■ ■ , Xt+io). 

While the simulations in Fig. 6 (b) does have the 
correct shape, if not the correct magnitude, the val- 
ues of a and r depicted in Fig. 6 (a) illustrate the 
problem with this approach. The parameter values 
are sensitively dependent on the observed data and do 
not reflect reality. In a separate publication we mod- 
ify this approach by considering longer windows and 
have shown the utility of the ratio r/a for estimat- 
ing the effectiveness of various governmental control 
strategies [13]. A combination of the random fluctu- 
ations approach of Eqn. (2) and the variability in a 
and r may produce more realistic simulations. How- 
ever, we feel that this would represent a significant over- 
parameterisation of this modelling problem. 

For the current investigation we conclude that a 
general SIR model is unable to capture the underly- 

^'^That is, the statistic is approximately pivotal [7]. 
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ing dynamics of the system. More exotic SIR inspired 
models, or more contrived noise inputs, may be able 
to reproduce the observed behaviour. Indeed, we ex- 
pect this should be the case when the system becomes 
complex enough. But this is beyond the focus of the 
current communication and is in general a less inter- 
esting problem. In the following section we consider a 
computationally more extensive model that is inspired 
by the physical situation and provides a more realistic 
alternative. 

4. Small virorld networks 

The final model structure we examine is computation- 
ally more complex, but also more realistic. In contrast 
to potentially exotic modified SIR models, the model 
we introduce here is inspired by our understanding of 
the physical situation. Small world networks (SWN) 
[14] have recently become immensely popular and re- 
ceived significant attention in a wide variety of fields 
[15] . One system which has provided evidence of small 
world structure is the network of social interaction be- 
tween individuals (the so-called "six degrees of separa- 
tion"). It is therefore natural to consider propagation 
of disease epidemics between nodes of a SWN. 

In the following SWN we suppose that each node 
is either susceptible to disease (S), infected (I) or re- 
moved (R). However, unlike the standard SIR struc- 
ture, nodes of the SWN can only infect those that they 
are acquainted with. We arrange the nodes in a rect- 
angular grid (for Hong Kong, we use a square with side 
length 2700). On a given day, each node can infect ni of 
its four immediate neighbours (horizontally and verti- 
cally) with constant probability pi . Each node also has 
a random number of long range acquaintances. The 
number of long range acquaintances follows an expo- 
nential distribution with an expected value of n2 and 
the acquaintances are chosen at random but fixed for 
each node. On each day, each of the long range acquain- 
tances may be infected with some constant probability 
P2 ■ Finally, on each day, every infected node has proba- 
bility ri of becoming removed. As with the SIR model, 
the only possible transitions are S I ^ R. 

For our first simulations we simplify this structure 
by setting ri = 0. To avoid eventual complete con- 
tamination we vary (i.e. decrease) pi and p2 with time 
according to the scheme in Fig. 7 (b). The values de- 
picted in Fig. 7 are selected only because they yield 
a model with has a realistic behaviour. Figure 7 (a) 
compares five random realisations of ASt to the ob- 
served data. We see that the quantitative features are 
reproduced nicely. 

In Fig. 8 (a) we repeat the surrogate calculations of 
the previous sections for 1000 realisations of the process 
depicted in Fig. 7. An examination of the variation in 
the estimated covariance shows that the data and model 
simulations are in very close agreement. We are there- 
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Fig. 7 Variable parameters (pi,P2)- The top plot is the 
Hong Kong SARS data together with five simulations of small 
world network model with the variables pi and p2 changing with 
time (presumably in response to changing public health practice). 
The bottom plot shows the variation in the parameters pi and 
P2 used in the top plot. These values were selected using an ad 
hoc process. 
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Fig. 8 Surrogate calculations. The top plot shows 1000 
simulations of the small world network with variable pi and p2 
(and ri =0), the bottom plot shows estimates for 1000 sim- 
ulations from the scale-free small world network with constant 
parameters (pi = 0.05, P2 = 0.005 and r = 0.10). 



fore unable to reject the model behaviour as a plausible 
model of propagation of SARS. This good agreement is 
due to a combination of long range correlation present 
in the model data and the same "burstiness" that is 
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also observed in the true data. In Fig. 7 we see that 
both model and simulations exhibit similar large spikes 
when one (or several) so-called "super-spreaders" are 
encountered at random. 

However, this model is not entirely satisfactory. 
The disease epidemic is controlled artificial by suitable 
selection of pi.2(i). Furthermore, no nodes ever become 
"removed". To remedy this we set pi{t) = pi = 0.05, 
P2{t) = P2 = 0.005, and ri =0.1. These values are se- 
lected after a process of trial and error, but do roughly 
correspond to the experience of SARS in Hong Kong. 
The probability pi is the daily probability of infected 
close family members or neighbours, P2 is the daily 
probability of infecting more casual acquaintances. Fi- 
nally, we allow for an exponential distribution of the 
number of neighbours with an expected value of 6. 
Hence the system truly is a scale free network and the 
existence of "super-spreaders" is explicitly modelled by 
this random variable. 

The time series produced by this simulation are 
similar to those of the restricted SWN in Fig. 7. The 
major differences are the inclusion of the removed cate- 
gory and that the infection probabilities do not change 
with time. A constant value for pi,2 may not be en- 
tirely realistic, but is is a statistical necessity to avoid 
over-fitting in such a complex stochastic model. A con- 
sequence of these constant values of pi_2 is that often 
the disease will "die-out" immediately, or, it will run 
out of control and infect the entire community. In sim- 
ulations, both situations are common, but so to is the 
eventual control of the epidemic: without changing the 
infection control policy! 

We do not show typical trajectories of this final 
model because they are similar to those shown in Fig. 
7, only with a slightly larger variability. In Fig. 8 we 
generate 1000 simulations of length greater than 90 (i.e. 
simulations that immediately die out are rejected) but 
no longer than 120 (longer simulations were truncated) 
of this SWN system (m = 4, E{n2) = 10, pi = 0.25, 
P2 = 0.001, and ri = 0.45) ^ and compare the covari- 
ance of ASt to the true data. In each case we are unable 
to reject the underlying SWN model. A slight increase 
in variability (when compared to the previous model 
with artificially variable ^1^2) is observed in this second 
structure and we believe that this is due to the lower 
level of "control" on the parameter values and the in- 
clusion of both simulations that die out, and those that 
run out of control in the analysis. 



'Simulations were repeated with a variety of other values 
and a wide variety of dynamics behaviour. Most notably the 
parameter values ni = 4, E{n2) — 6, pi = 0.05, p2 = 0.005, 
and ri =0.1 produced surrogate results similar to those in 
Fig. 8. These two sets of parameter values where selected 
as they reflected the authors' understandings of the physical 
environment. 



5. Conclusion 

The daily reported SARS figures for Hong Kong have 
been considered. We find that the data is not con- 
sistent with a random walk or a noisy version of the 
standard SIR model. However, two SWN structures do 
exhibit dynamics indistinguishable from the true data. 
To accurately simulate the spread of disease in society, 
including features observed during the SARS epidemic 
(such as rapid, extensive and localised outbreaks), one 
needs to consider complex SWN type structures. The 
power of the SIR model lies in its ability to model large 
scale spread of a contagion, the SWN structure de- 
scribed here is much better at simulating the localised 
variability. This localised variability, inherent to the 
model simulations may allow administrators and fore- 
casters to estimate possible variation in their predic- 
tions of disease dynamics (i.e. to provide error bars 
on their projections and evaluate the likelihood of vari- 
ous scenarios). Such techniques are certain to be useful 
when planning and implementing control strategies. In 
a related publication, we provide a more detailed anal- 
ysis of the potential scale- free structure of various SWN 
models [13]. Furthermore, in this paper we propose the 
variable parameter SWN model is actually inspired by 
governmental public health practice [13]. In this cur- 
rent we our primary goal is to show that SWN mod- 
els are able to capture the underlying dynamics of the 
SARS epidemic better than either simple stochastic or 
SIR models. 
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